Prior to conducting an analysis using R, the data was reviewed and audited in Python. Please see Python code. I noticed that there were a wide range of occupations that people had inputted while making their contributions. In order to have a more meaningful analysis regarding contributor’s occupations, I classified occupations into general categories (e.g. MEDICAL, EDUCATION, RETIRED, etc.). The original data was read into a Python script and a new csv file was output with occupation categories. Any contributions that were negative numbers were skipped since these are refunds. The focus is on contributions made to judge the excitement and engagment of individual donors during the election cycle.
The first stage of the analysis is to look at a summary of the data. Most of the columns in the data are categorical variables. Tables were created for each of these variables to examine the counts and relative frequencies within the data. The contribution amount is a quantitative varible and will be analyzed using measures of center and spread.
The following columns will be explored in this analysis: ‘cand_nm’, ‘contbr_city’, ‘contbr_employer’, ‘contbr_occupation’, ‘occupation_category’, ‘contb_receipt_amt’, ‘contb_receipt_dt’, and ‘election_tp’
The code block below includes the structure of the data, a summary of the data, and two sets of tables. The first table lists the top 10 (if there are 10 categories) counts for each of the columns. The second table lists the same information but as percentages. A table of some detailed summary statistics for the quantitative variable ‘contb_receipt_amt’ is also included.
## 'data.frame': 9099 obs. of 19 variables:
## $ cmte_id : Factor w/ 18 levels "C00430470","C00430512",..: 16 17 16 14 14 14 14 14 14 14 ...
## $ cand_id : Factor w/ 17 levels "P00003186","P00003251",..: 7 1 7 17 17 17 17 17 17 17 ...
## $ cand_nm : Factor w/ 17 levels "Biden, Joseph R Jr",..: 13 17 13 8 8 8 8 8 8 8 ...
## $ contbr_nm : Factor w/ 2880 levels "ABRAHAM, LAUREL",..: 2168 600 2736 2329 2126 2126 2126 2126 2126 2126 ...
## $ contbr_city : Factor w/ 347 levels "ALBRIGHT","ALDERSON",..: 214 222 282 19 32 32 32 32 32 32 ...
## $ contbr_st : Factor w/ 1 level "WV": 1 1 1 1 1 1 1 1 1 1 ...
## $ contbr_zip : int 252650336 259019766 24976 25801 25818 25818 25818 25818 25818 25818 ...
## $ contbr_employer : Factor w/ 1265 levels "","(NONE)","3M HEALTH INFORMATION SYSTEMS",..: 40 490 403 1 465 465 465 465 465 465 ...
## $ contbr_occupation : Factor w/ 858 levels "","227 CAPITOL ST",..: 258 395 301 698 510 510 510 510 510 510 ...
## $ occupation_category: Factor w/ 12 levels "EDUCATION","EXECUTIVE",..: 6 6 6 9 1 1 1 1 1 1 ...
## $ contb_receipt_amt : num 20 2300 25 50 11 5.18 5 15 1 2.98 ...
## $ contb_receipt_dt : Factor w/ 601 levels "1-Apr-07","1-Apr-08",..: 338 407 257 599 55 94 114 132 132 206 ...
## $ receipt_desc : Factor w/ 10 levels "","REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC)",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ memo_cd : Factor w/ 2 levels "","X": 1 1 1 1 1 1 1 1 1 1 ...
## $ memo_text : Factor w/ 21 levels "","* EARMARKED CONTRIBUTION: SEE BELOW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ form_tp : Factor w/ 2 levels "SA17A","SA18": 1 1 1 1 1 1 1 1 1 1 ...
## $ file_num : int 336959 313015 336959 330012 330012 330012 330012 330012 330012 330012 ...
## $ tran_id : Factor w/ 9109 levels "10000477","10002226",..: 2761 8306 3006 8683 8693 8695 8696 8697 8698 8699 ...
## $ election_tp : Factor w/ 2 levels "G2008","P2008": 2 2 2 2 2 2 2 2 2 2 ...
## cmte_id cand_id cand_nm
## C00431445:5087 P80003338:5087 Obama, Barack :5087
## C00431569:1601 P00003392:1601 Clinton, Hillary Rodham:1601
## C00430470:1082 P80002801:1314 McCain, John S :1314
## C00431205: 302 P40002347: 302 Edwards, John : 302
## C00432914: 252 P80000748: 252 Paul, Ron : 252
## C00446104: 232 P80003478: 127 Huckabee, Mike : 127
## (Other) : 543 (Other) : 416 (Other) : 416
## contbr_nm contbr_city contbr_st
## BRADLEY, ROBERT : 59 CHARLESTON :1375 WV:9099
## STERNS, CAROLYN : 54 MORGANTOWN : 935
## HURSH, DANIEL : 49 HUNTINGTON : 506
## JENNINGS, ALAN : 42 SHEPHERDSTOWN: 364
## PARANAC, LEONARD R. MR.: 39 WHEELING : 262
## RINKER, SARAH : 37 HARPERS FERRY: 253
## (Other) :8819 (Other) :5404
## contbr_zip contbr_employer
## Min. : 0 NOT EMPLOYED :1996
## 1st Qu.:253032865 SELF EMPLOYED : 898
## Median :254431113 RETIRED : 571
## Mean :230753993 : 310
## 3rd Qu.:261478043 INFORMATION REQUESTED: 268
## Max. :268840007 (Other) :5045
## NA's : 11
## contbr_occupation occupation_category contb_receipt_amt
## RETIRED :2484 OTHER :2770 Min. : 1
## ATTORNEY : 603 RETIRED :2489 1st Qu.: 30
## PHYSICIAN : 371 MEDICAL :1467 Median : 100
## NOT EMPLOYED : 225 EDUCATION : 795 Mean : 210
## INFORMATION REQUESTED: 203 EXECUTIVE : 783 3rd Qu.: 200
## (Other) :5207 UNEMPLOYED: 239 Max. :2300
## NA's : 6 (Other) : 556
## contb_receipt_dt receipt_desc memo_cd
## 30-Sep-08: 170 :8686 :8565
## 16-Oct-08: 135 REDESIGNATION FROM PRIMARY : 207 X: 534
## 31-Oct-08: 122 REATTRIBUTION/REDESIGNATION REQUESTED: 81
## 23-Oct-08: 101 REDESIGNATION TO : 56
## 24-Oct-08: 100 REDESIGNATION REQUESTED : 48
## 31-Jul-08: 100 REATTRIBUTION REQUESTED : 13
## (Other) :8371 (Other) : 8
## memo_text form_tp
## :8288 SA17A:8577
## OVF TRANSFER : 298 SA18 : 522
## REDESIGNATION FROM PRIMARY : 207
## REATTRIBUTION/REDESIGNATION REQUESTED: 81
## REDESIGNATION TO : 56
## ORIGINAL TRANSACTION : 54
## (Other) : 115
## file_num tran_id election_tp
## Min. :294891 10000477: 1 G2008:2790
## 1st Qu.:353643 10002226: 1 P2008:6309
## Median :753761 10003971: 1
## Mean :597461 10006491: 1
## 3rd Qu.:753821 10008089: 1
## Max. :877004 10008114: 1
## (Other) :9093
## $cand_nm
## x
## Obama, Barack Clinton, Hillary Rodham McCain, John S
## 5087 1601 1314
## Edwards, John Paul, Ron Huckabee, Mike
## 302 252 127
## Romney, Mitt Giuliani, Rudolph W Thompson, Fred Dalton
## 109 95 67
## Brownback, Samuel Dale
## 46
##
## $contbr_city
## x
## CHARLESTON MORGANTOWN HUNTINGTON SHEPHERDSTOWN WHEELING
## 1375 935 506 364 262
## HARPERS FERRY MARTINSBURG BECKLEY PARKERSBURG CHARLES TOWN
## 253 251 200 188 165
##
## $contbr_employer
## x
## NOT EMPLOYED
## 1996
## SELF EMPLOYED
## 898
## RETIRED
## 571
##
## 310
## INFORMATION REQUESTED
## 268
## WEST VIRGINIA UNIVERSITY
## 262
## SELF-EMPLOYED
## 240
## INFORMATION REQUESTED PER BEST EFFORTS
## 107
## SELF
## 93
## NONE
## 79
##
## $contbr_occupation
## x
## RETIRED ATTORNEY PHYSICIAN
## 2484 603 371
## NOT EMPLOYED INFORMATION REQUESTED PROFESSOR
## 225 203 189
## TEACHER HOMEMAKER
## 189 184 133
## LAWYER
## 111
##
## $occupation_category
## x
## OTHER RETIRED MEDICAL EDUCATION EXECUTIVE UNEMPLOYED
## 2770 2489 1467 795 783 239
## HOMEMAKER LEGAL STUDENT POLITICAL
## 211 162 57 50
##
## $contb_receipt_dt
## x
## 30-Sep-08 16-Oct-08 31-Oct-08 23-Oct-08 24-Oct-08 31-Jul-08 30-Aug-08
## 170 135 122 101 100 100 98
## 29-Sep-08 30-Oct-08 31-Aug-08
## 94 86 85
##
## $election_tp
## x
## P2008 G2008 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 6309 2790 NA NA NA NA NA NA NA NA
## $cand_nm
## x
## Obama, Barack Clinton, Hillary Rodham McCain, John S
## 55.9072426 17.5953401 14.4411474
## Edwards, John Paul, Ron Huckabee, Mike
## 3.3190460 2.7695351 1.3957578
## Romney, Mitt Giuliani, Rudolph W Thompson, Fred Dalton
## 1.1979338 1.0440708 0.7363447
## Brownback, Samuel Dale
## 0.5055501
##
## $contbr_city
## x
## CHARLESTON MORGANTOWN HUNTINGTON SHEPHERDSTOWN WHEELING
## 15.111551 10.275854 5.561051 4.000440 2.879437
## HARPERS FERRY MARTINSBURG BECKLEY PARKERSBURG CHARLES TOWN
## 2.780525 2.758545 2.198044 2.066161 1.813386
##
## $contbr_employer
## x
## NOT EMPLOYED
## 21.9364765
## SELF EMPLOYED
## 9.8692164
## RETIRED
## 6.2754149
##
## 3.4069678
## INFORMATION REQUESTED
## 2.9453786
## WEST VIRGINIA UNIVERSITY
## 2.8794373
## SELF-EMPLOYED
## 2.6376525
## INFORMATION REQUESTED PER BEST EFFORTS
## 1.1759534
## SELF
## 1.0220903
## NONE
## 0.8682273
##
## $contbr_occupation
## x
## RETIRED ATTORNEY PHYSICIAN
## 27.299703 6.627102 4.077371
## NOT EMPLOYED INFORMATION REQUESTED PROFESSOR
## 2.472799 2.231014 2.077151
## TEACHER HOMEMAKER
## 2.077151 2.022200 1.461699
## LAWYER
## 1.219914
##
## $occupation_category
## x
## OTHER RETIRED MEDICAL EDUCATION EXECUTIVE UNEMPLOYED
## 30.4429058 27.3546544 16.1226508 8.7372239 8.6053412 2.6266623
## HOMEMAKER LEGAL STUDENT POLITICAL
## 2.3189361 1.7804154 0.6264425 0.5495109
##
## $contb_receipt_dt
## x
## 30-Sep-08 16-Oct-08 31-Oct-08 23-Oct-08 24-Oct-08 31-Jul-08 30-Aug-08
## 1.8683372 1.4836795 1.3408067 1.1100121 1.0990219 1.0990219 1.0770414
## 29-Sep-08 30-Oct-08 31-Aug-08
## 1.0330806 0.9451588 0.9341686
##
## $election_tp
## x
## P2008 G2008 <NA> <NA> <NA> <NA> <NA> <NA>
## 69.33729 30.66271 NA NA NA NA NA NA
## <NA> <NA>
## NA NA
## nbr.val nbr.null nbr.na min max
## 9099.0 0.0 0.0 1.0 2300.0
## range sum median mean SE.mean
## 2299.0 1910651.2 100.0 210.0 4.2
## CI.mean.0.95 var std.dev coef.var
## 8.2 160423.2 400.5 1.9
## Obama, Barack Clinton, Hillary Rodham McCain, John S
## 38.025 23.954 16.651
## Giuliani, Rudolph W Edwards, John Romney, Mitt
## 5.806 5.545 2.124
## Paul, Ron Thompson, Fred Dalton Brownback, Samuel Dale
## 2.119 2.055 1.378
## Huckabee, Mike Richardson, Bill Biden, Joseph R Jr
## 0.950 0.662 0.304
## Kucinich, Dennis J Tancredo, Thomas Gerald Hunter, Duncan
## 0.211 0.131 0.047
## Gilmore, James S III Dodd, Christopher J
## 0.026 0.012
There are 9,099 different contributions recorded in this data set for West Virginia for the primary and general election cycles in 2008. 69% of the contributions were made during the primaries, and 31% were made during the general election. The highest activity occurred on September 30th, 2008 when 1.87% of the total contributions were made. The “Other” category of occupation is the most common category with 30.4%. The next highest occupation categories are Retired, Medical, Education, and Executive. Unemployed, Homemaker, Legal, Student, Political, and Religious make up less than 10% of the contributions. Contributions made from Charleston, WV made up 15% of the contributions. Barack Obama received a little over half of all of the contributions made. By actual amount, Obama received 38% of the sum of all contribution amounts.
I plotted each of the categorical variables of interest using bar charts. I created a histogram and boxplot for the contribution amounts.
## $cand_nm
##
## $occupation_category
##
## $election_tp
The above three bar charts show the candidate names, occupation categories, and election type summaries with counts for the numbers of contributions. These plots give a general sense of the relationship of these variables is with relation to number of contributions. It is easy to see that Obama received the highest count of contributions, the Other and Retired occupation categories are the most common, and more contributions were made during the primaries.
The plots corresponding with city, employer, actual occupation reported, and contribution date were created to limit the number of categories on the x axis. For example, the number of employers reported within the data set was 1,265.
It would not be feasible to show all of these in one bar chart. This also justifies the use of creating occupation categories to find more meaningful relationships between occupations and contributions.
The plots above show various summary statistics and how they compare across the various occupation categories, candidates, and cities. I placed a dot corresponding with either the mean or median on the plots to highlight any differences between the two measurements.
The plot showing the mean amounts by candidate was interesting because I noticed that the minimum dot for Gilmore was on the mean amount. I checked to see the contributions in the data for Gilmore, and my suspicion was confirmed – he only received one $500 contribution in April 2007.
## cand_nm contb_receipt_amt contb_receipt_dt
## 8331 Gilmore, James S III 500 19-Apr-07
The plots for the cities are not that useful because the number of contributions from the cities displayed is very small (usually only 1 contribution), so that makes the summary statistics not very good descriptors. I decided to instead make the sample plots but use the top cities by the number of contributions made by city. The difference is that one can now see that major cities such as Charleston, Morgantown, Wheeling, etc. are displayed instead of cities with small numbers of contributions.
The final set of univariate plots are basic descriptive plots showing the distribution of the quantitative variable contribution amount. Histograms and boxplots were created (one with a regular scale and one with a log10 scale since the data is so heavily skewed to the right). On the boxpot, it is interesting to note the darker bands of jittered points. These appear to correspond with contribution amounts that are commonly made such as $500 or $1000. The following is a list of the top 15 contribution amounts.
##
## 100 25 50 250 500 1000 200 30 10 2300 20 150 300 15 35
## 1798 1423 1415 718 454 341 300 272 199 182 150 146 144 98 91
The cleaned data includes information on 9109 individual contributions. There are 19 columns which are described here.
The main features of interest are the candidates, contributor occupations, contributor cities, contribution amounts, and contribution dates.
Of the 17 candidates who received contributions in this data, only a handful are probably worth investigating. For example, there are only 3 candidates (Obama, McCain, and Clinton) who received over 10% of the total number of contributions. Only 5 other candidates received more than 1% of the total number of contributions. This could be an argument to only focus on the data for Obama, McCain, and Clinton. Another case for focusing on the main candidates would be that many of the other candidates dropped out early in the primary cycle. Edwards, Romney, and Huckabee dropped out in January, February, and March 2008, respectively (Democratic Primary & Republican Primary). Republicans Brownback, Gilmore, and Tancredo who appear in the data withdrew before the primaries (Withdrew before primaries).
Other candidates in the data set have such a small amount of contributions to justify looking at the main candidates of Clinton, McCain, and Obama in later sections.
Yes, during the cleaning and processing of the file, I decided to place the occupations into different bins. There were too many variations of different occupations, and I wanted to be able to do some analysis on occupation and its effect on contributions.
The distribution of contribution amounts is heavily skewed to the right. This makes sense since there are probably many people who give small amounts, and a few donors who give higher or the maximum amounts. I did perform some adjusting of the data prior to loading it into R. I used Python code to create the different occupation categories, so I could complete a more meaningful analysis. The original data has 1,265 different employer names and 858 different occupation names. I chose to create 12 occupation categories of what I thought were the most prevelant within the data (Other, Retired, Medical, Education, Executive, Unemployed, Homemaker, Legal, Student, Political, Self-Employed, and Religious).
Upon creating the boxplots for contribution amounts, I noticed something strange. There were several contribution amounts at $2300, and then the amounts jumped up to $4000 and $4600. I decided to investigate this further.
## contbr_nm cand_nm contb_receipt_amt
## 257 HILDENBRAND, OLGA I Edwards, John 4600
## 406 PETROPLUS, PARRY G. MR. Giuliani, Rudolph W 4600
## 698 REED, CANDACE MS. Giuliani, Rudolph W 4600
## 700 REED, JAMES W. MR. JR. Giuliani, Rudolph W 4600
## 961 BOLEN, KENNETH MR. Thompson, Fred Dalton 4600
## 2723 MORGAN, CRAIG M. DR. McCain, John S 4000
## 3515 FERRELL, VICKI L Obama, Barack 4600
## 3518 FERRELL, JOE C Obama, Barack 4600
## 4618 UMBERGER, SARAH Obama, Barack 4600
## 4866 SHIMM, DAVID S. MR. McCain, John S 4600
## receipt_desc election_tp
## 257 REATTRIBUTION/REDESIGNATION REQUESTED P2008
## 406 SEE REATTRIBUTION P2008
## 698 P2008
## 700 P2008
## 961 P2008
## 2723 REATTRIBUTION / REDESIGNATION REQUESTED (AUTOMATIC) P2008
## 3515 P2008
## 3518 P2008
## 4618 P2008
## 4866 SEE REATTRIBUTION P2008
It seems that many of these entries required reattribution or redesignation. I researched what this means at the Federal Election Commission Website and found that this means these contributions were over the $2300 dollar limit that was in place for individual contributions in 2008 (FEC Contribution Limits)
I decided to look at one of the names from the “weird” contributions to see what was going on.
## contbr_nm cand_nm contb_receipt_amt
## 256 HILDENBRAND, OLGA I Edwards, John 2300
## 257 HILDENBRAND, OLGA I Edwards, John 4600
## 258 HILDENBRAND, OLGA I Edwards, John 2300
## receipt_desc election_tp contb_receipt_dt
## 256 REATTRIBUTION/REDESIGNATION REQUESTED P2008 30-Jun-07
## 257 REATTRIBUTION/REDESIGNATION REQUESTED P2008 30-Jun-07
## 258 G2008 30-Jun-07
It looks like the $4600 contribution was reallocated into two separate $2300 contributions – one for the primarly cycle and one for the general cycle which is allowable since $2300 is the limit per election. I decided to remove these rows (with contribution amounts > $4000) from the data set.
Since the majority of the variables are categorical, I used stacked bar charts as well as boxplots split by factor levels. I also created some heat maps to see any interesting patterns.
These first two bar charts show the relationship between candidate and city.
The first chart is too hard to read, so I limited the second chart to include the “main” candidates. One can use this type of chart to look at relationships such as how McCain received a larger share of contributions from Charleston than Morgantown. The third chart shows the same type of information but categorized by occupation category. The final bar chart shows the total contribution amounts by election type. It is obvious that more money was contributed for the primary elections.
## $cand_nm
##
## $occupation_category
##
## $election_tp
## $cand_nm
##
## $occupation_category
##
## $election_tp
Boxplots categorized by candidate, occupation category, and election type showing the relationship between these variables and contribution amounts were created. I decided to “zoom in” a bit to see the canidate boxplots a little better. It is interesting to see that approximately 75% of the Obama contribution amounts are lower than 50% of the Clinton and McCain contribution amounts. The striking visual is that Obama has many more dots overall.
Violin plots of the same data are shown for comparison purposes. One can see the plots get wider at common contribution amounts (see the election type violin plot for a good example of this at $500 and $1000).
I tried to make some heatmaps to show the relationships of contributions with occupation categories and cities. The information conveyed by these maps confirms that Obama received much larger counts of contributions.
I looked at many of the relationships between the categorical variables to see if anything interesting popped out of the data. I did notice that McCain received a larger share of the retired vote as compared to other occupation categories. He also received a larger share of contributions from Charleston.
I did notice that the amount of contributions was much greater during the primaries than the general election. I think this can be attributed to two factors: 1) there are far more candidates during the primaries and 2) the electorate seems to be more concerned during the primaries because West Virginia is a “red” state, so many may not decide to contribute during the general election cycle as they think it may not make a difference.
The strongest relationship I found was that no matter how the data is looked at, Obama tends to dominate the contributions across occupation and city.
I used most of the same bivariate plots, but added faceting in order to add another variable for analysis since most of the variables are categorical.
It looks like Homemakers did not contribute as much to Clinton (speculation: maybe this has something to do with gender, but more research and data would be needed). The plot by candidate and date is intersting because I am surprised to see contributions to Clinton at these late dates corresponding to the general election.
Something to notice is to look at Legal contributions from Charleston. There appear to be more contributions than in the other top cities. Wheeling looks like it has fewer contributions from Retired.
Something to note is that some more research should be conducted to analyze the contributions being classified as primary contributions in September and October. These might need to be fixed within the data set. Looking quickly at the contributions made on these dates that are classified as primary contributions shows that at least one has a note that a redesignation was requested, so that may be what needs to happen with all of these contributions.
## contbr_nm cand_nm contb_receipt_amt
## 569 RIEDEL, PAUL B. MR. McCain, John S 500
## 5909 BARNES, PHYLLIS Clinton, Hillary Rodham 250
## 5943 WICH, JOAN HOHLT MS Clinton, Hillary Rodham 2300
## 5948 BERTINUSON, JANET Clinton, Hillary Rodham 100
## 9087 PLESA, JOHN MR. McCain, John S 25
## 9093 NELSON, ROMEY L. MR. McCain, John S 100
## receipt_desc election_tp contb_receipt_dt
## 569 REDESIGNATION REQUESTED P2008 30-Sep-08
## 5909 P2008 23-Oct-08
## 5943 P2008 16-Oct-08
## 5948 P2008 23-Oct-08
## 9087 P2008 16-Oct-08
## 9093 P2008 16-Oct-08
The previous two plots are bar charts that are similar to the box plots. I only created charts that looked at contribution amounts related to candidate/election type and candidate/occupation category because I did not notice anything paticular in the various boxplots of the same variables.
The relationships I looked at involved looking at contribution amounts split by two of the categorical variables of interest. It was interesting to notice that McCain did not receive any contributions from the Unemployed or Political Categories. I was surprised to see that Clinton received some donations in October since she was not in the general election.
I was surprised to see that Clinton actually received more contributions during the primary election cycle. It was also surprising to see that Obama received a lot of contributions for the general cycle while McCain did not.
No.
The above bar chart shows how the contribution amounts differed between the primary election cycle and the general election cycle in 2008. I chose to show the top 5 candidates by contribution amount to see the drop off between the primary and general cycles. Hillary Clinton received the most money on the Democractic side, and she subsequently won the primary in West Virginia with 66.93% of the vote (West Virginia Democratic primary, 2008). What’s interesting to note is the significant advantage Obama had in contribution amounts for the general election cycle versus McCain. A hypothesis could be that the Clinton supporters who gave her huge amounts of money during the primaries shifted their money to Obama in the general election. It didn’t matter – McCain got 55.60% of the vote in West Virginia. (United States presidential election in West Virginia, 2008).
This plot shows how each of the top three candidates fared with each of the occupation categories in terms of contribution money received. I was surprised that the “religious” category contributed so little as compared with other occupation categories (there are many churches in the city I reside in). I am also surprised in general at how much more the Democrats received than McCain received. Perhaps a plausible explanation is that West Virginia is a “red” state, so many Republicans do not feel the need to donate money to the Republican because he will will anyway. Obama dominated the major contribution categories of Medical, Other, and Retired. McCain broke even with Homemakers.
This box plot shows a comparison of the distributions of contribution amounts by occupation category. The Other, Medical, Executive, Homemaker, Legal, and Self-Employed categories have the highest median contributions, but they are also the most spread out. The Retired, Education, Unemployed, Student, Political, and Religious categories have the lowest median contribution amounts. This makes sense since retired people usually have lower, fixed incomes, educators are notoriously complaining about low wages, unemployed people probably cannot afford to contribute high amounts, and students are known to always be broke. I chose to zoom in on the data to get a better look at and compare the majority of the data shown in the box plots.
I selected information on election contributions because I was interested to see trends. After downloading the data, I quickly realized that many of the techniques I am familiar with (histograms, scatterplots) would not be so useful with this data set because there is really only one quantitative variable of interest (contribution amount). I had to really think about which categorical variables were the most important and then determine the best way to use faceting to create meaningful plots. I enjoyed working with this data set as I learned a lot of useful techniques and tools with ggplot2. I was successful in wrangling the occupation categories to create more meaningful analysis across occupations. Future analysis could focus on the specific dates within the data set. Time series date for each candidate could be plotted in order to see any trends over time. It would be interesting to see candidates who still receive contributions after they drop out of the primary, and how the amounts wane over time. I also discovered that wrangling and cleaning does not end when the analysis starts. I found several instances related to inconsistent data. In one case, I was able to fix the errors (the redisgnations). In the other, I need to complete further research to determine why some contributions late in the election cycle (September, October) are being classified as primary contributions.